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Abstract — In this paper, a novel decoding algorithm for low- 
density parity-check (LDPC) codes based on convex optimization 
is presented. The decoding algorithm, called interior point decod- 
ing, is designed for linear vector channels. The linear vector 
channels include many practically important channels such as 
inter symbol interference channels and partial response channels. 
It is shown that the maximum likelihood decoding (MLD) rule for 
a linear vector channel can be relaxed to a convex optimization 
problem, which is called a relaxed MLD problem. The proposed 
decoding algorithm is based on a numerical optimization tech- 
nique so called interior point method with barrier function. 
Approximate variations of the gradient descent and the Newton 
methods are used to solve the convex optimization problem. In a 
decoding process of the proposed algorithm, a search point always 
lies in the fundamental polytope defined based on a low-density 
parity-check matrix. Compared with a convectional joint message 
passing decoder, the proposed decoding algorithm achieves better 
BER performance with less complexity in the case of partial 
response channels in many cases. 

Index Terms: LDPC code, linear vector channel, interior 
point algorithm, convex optimization 

I. Introduction 

The development of decoding algorithms for binary linear 
codes has been a central research theme in coding theory. Re- 
cent research activity on message-passing decoding algorithms 
has made remarkable progress, and is bringing a shift in the 
design principle of decoders from algebraic to probabilistic 
algorithms. The sum-product algorithm for low-density parity- 
check (LDPC) codes is a particular example. The combination 
of LDPC codes and the sum-product algorithm achieves a good 
trade-off between decoding performance and decoding com- 
plexity. Particularly for memoryless channels, this combination 
offers an almost satisfactory solution. 

Message passing decoding has not only been applied to 
memoryless channels, but also to channels with memory. 
Worthen and Stark [10] first presented the unified factor graph 
approach for the design of a decoding algorithm for channels 
with memory. The unified factor graph includes two graphs 
as its sub-graphs: the factor graph for an LDPC code (the 
so-called Tanner graph) and the factor graph representing 
the target channel. A message passing decoding algorithm 
is naturally derived from the unified graph. Considerable 
attention has been focused on this approach, with progress 
made on areas such as burst-error channels [11] and partial 
response channels [12]. 
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The present study instead examines, the possibility of us- 
ing an optimization approach for the decoding problems of 
channels with memory. We can view a decoding problem as 
an optimization problem, whereby most conventional decoding 
algorithms designed for binary linear codes can be regarded as 
algorithms for solving a combinatorial optimization problem. 
Let us consider the following example to make the succeeding 
discussion concrete. 

A codeword a; in a binary code C C F2 (F2 is the 
binary Galois field) is sent to an additive white Gaussian 
noise (AWGN) channel after binary (0,1) to bipolar (-1-I, -1) 
conversion 

r={l-2x) + z, (1) 

where z represents a Gaussian noise vector and 1 denotes 
the vector with all-1 element^ The maximum likelihood 
decoding (MLD) rule for this channel model is given by 

i^arg min ||r-(l-2a;')|P, (2) 

where || • || is the Euclidean norm. The rule is, indeed, 
a combinatorial optimization problem because the feasible 
set C is a discrete set with both a combinatorial and also 
an algebraic structure. We often utilize combinatorial and 
algebraic properties of the code to solve the MLD problem. 
For example, the Viterbi algorithm, which is a realization of 
dynamic programming, relies heavily on the trellis structure of 
the code. In general, the MLD problem is a computationally 
intractable problem when the code length is large, and so 
we need approximation algorithms that can yield sub-optimal 
solutions at a reasonable computational cost. The sum-product 
algorithm for LDPC codes [1] can be seen as one such 
approximation algorithm. 

A technique called relaxation is well known in the field 
of combinatorial optimization [2] as an approach for difficult 
combinatorial optimization problems. The basic idea of the 
relaxation is to relax the definition of the feasible set from a 
discrete set to a subset of an n-dimensional Euclidean space 
7?^". For example, integer Unear programming (ILP) can be 
regarded as a combinatorial optimization problem. Elimination 
of the constraint that "a feasible point is integral" leads to 
a linear programming (LP) problem that can be efficiently 
solved by the simplex algorithm or an interior point algorithm. 
Although the solution of the relaxed problem may be different 
from the solution of the original problem, the approximate 
solution can be effectively used to find the optimal solution 
[2]. 

The work on LP decoding due to Feldman [3] is the first 
application of the idea of relaxation in coding theory. An 

'In this paper, the elements in F2, i.e., {0,1}, can also be regarded as 
elements in TZ. Thus, the appropriate arithmetic (e.g., mod-2 sum or addition 
of real numbers) depend on the context. 
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important implication of this work on LP decoding is that the 
MLD problem can be relaxed to a linear optimization problem 
defined on TZ^\ Once a decoding problem has been relaxed 
to an optimization problem, we can then exploit an efficient 
numerical optimization technique to solve the relaxed problem. 

Hitherto, most of the research activity on LP decoding has 
focused on memoryless binary-input output-symmetric (BIOS) 
channels. This may be because the formulation of the LP prob- 
lem is strongly dependent on the memoryless property and the 
BIOS assumption. It is therefore challenging and meaningful 
to consider a relaxed MLD problem for channels with memory. 
For example, channels with inter-symbol interference (ISI) and 
partial response (PR) channels are of practical importance. 
The development of an efficient decoding algorithm for such 
channels is an important problem in coding theory. Recently, 
Taghavi and Siegel [5] showed that a new relaxation method 
for PR channels converts the MLD problem for PR channels 
to a linear programming problem. This work indicates that 
the relaxation approach is effective not only for memoryless 
channels, but also for channels with memory. 

In this paper, a novel decoding algorithm for LDPC codes 
based on convex optimization is presented. The decoding 
algorithm, called interior point decoding, is designed for linear 
vector channels. The linear vector channels include many 
channels of practical importance, such as ISI channels and PR 
channels. It is shown that the MLD rule for a linear vector 
channel can be relaxed to a convex optimization problem, 
which is called a relaxed MLD problem. Approximate vari- 
ations of the gradient descent and the Newton methods are 
used to solve the convex optimization problem. 

In a decoding process of the proposed algorithm, a search 
point always lies in the fundamental polytope [3] defined based 
on an LDPC matrix. The merit function to be minimized 
in a decoding process consists of two parts: an objective 
function and a log-barrier function. The objective function is 
the distance between the received vector and a point in the fun- 
damental polytope. The log-barrier function corresponds to the 
constraints on the fundamental polytope, where this function 
is used so that the trajectory of the search points does not 
get close to the boundary of the fundamental polytope. Error 
analysis based on the geometrical properties of a polytope is 
also presented. The decision regions of a relaxed ML decoder 
can be characterized by normal cones of the affine image of 
the fundamental polytope. 

The contents of the paper are organized as follows. In Sec- 
tion 2, basic notations and definitions are introduced. Section 
3 presents a geometrical view of the relaxed MLD problem 
for linear vector channels. Section 4 gives an overview of the 
proposed algorithm. The details of the optimization methods 
are explained in Section 5 (on the gradient descent method) 
and 6 (on the Newton method). Section 7 includes simulation 
results which show the behaviors of the proposed algorithm. 
Section 8 gives conclusions. 

II. Preliminaries 

In this section, we first introduce the definitions of the 
linear vector channel and the MLD problem for the linear 



vector channels, before then discussing the relaxed decoding 
problem. 

A. Linear vector channels and MLD rule 

Suppose that is a binary mxn matrix and C is the binary 
linear code defined based on H, 

C={xe F2" : Hx = 0}. (3) 

The matrix _ff is a row-regular sparse matrix whose rows have 
row weight Wr > 3 (In this paper, a bold-face symbol, for 
example x, denotes a column vector). 

A linear vector channel, which is the main channel consid- 
ered by the present study, is defined as follows. 

Definition 1 (Linear vector channel): A sender first 
chooses a codeword a; e C according to the message that 
he/she wishes to send. The codeword is transmitted to the 
channel and then a receiver obtains a received vector r e 7?.": 

r = Ax + b+ z, (4) 

where ^ is a non-singular n x n real matrix (called an 
interference matrix) and b is a real column vector of length n 
(called an offset vector). Note that the elements of F2, {0, 1}, 
are also considered to be elements of TZ in Q. It is assumed 
that both A and b are known by the receiver The vector z 
denotes an additive noise vector This channel model is called 
a linear vector channel. q 

The offset vector b is introduced to represent conversion of 
a code-representation (on F2) to a signal-representation (on 
TZ). For example, in the case of channel ([T]i, 1 corresponds to 
the present b, and is used for the binary to bipolar conversion. 
Strictly speaking, the channel defined in Q should be called 
an affine vector channel, because the transmitted signal s is 
generated by an affine transformation s = Ax + b from a 
codeword x. However, we use the more conventional name 
for this channel since the offset vector is not essential for the 
following discussion, and it can be seen as a part of noise or 
the mean of the noise. 

If the vector z is an additive white Gaussian noise vector 
whose ith element Zi{i E [1, n]) has mean and variance ct^, 
the channel is called a Gaussian linear vector channel. Note 
that the notation [a, b] denotes the set of consecutive integers 
from a to 6. 

The class of linear vector channels is wide, and includes 
many channels of practical interest, such as the AWGN 
channel, ISI channels, and PR channels. In order to achieve 
the best decoding performance with respect to the block error 
probability, we need to perform MLD for this channel. The 
following definition gives the MLD rule for linear vector 
channels. 

Definition 2 (MLD rule for linear vector channel): 
Assume that the sender transmits a codeword in C and the 
receiver observes r e TZ'"' as the received word. The MLD 
rule for a linear vector channel is given by 

X = aig mm d{{ Ax + b),r) (5) 

The function d{-,-) isa distance function defined on TZ" which 
matches the probability density function of the noise vector 
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The vector a; e C is the estimation word obtained from the 
MLD process. q 
Example 1: For the case of a Gaussian Hnear vector chan- 
nel, the noise vector z is distributed according to an n- 
dimensional Gaussian distribution and its covariance matrix is 
diagonal (i.e., i.i.d. case). In this case, we have the following 
MLD rule: 



X = are mm r 

xec 



where 



{Ax + b)f, 
represents the Euclidean norm defined by 



(6) 



(7) 



If the noise is correlated (i.e., colored Gaussian noise), we 
can derive the MLD rule for such a channel that includes the 
(inverse of) covariance matrix of the nois^ 

Although the MLD rule (j5]l gives us an optimal estimation, 
its computational complexity is of an exponential order of the 
code length n. This prevents the use of MLD in practical appli- 
cations, and there is thus a requirement for an approximation 
of the MLD rule in order to reduce this computational cost. 



B. Relaxed MLD rule 

We here introduce a relaxed MLD rule which is an approx- 
imation of the MLD rule (|5]). This relaxed rule is the basis 
of the interior point decoding to be presented in the latter 
sections. The basic idea of the relaxed rule is to relax the 
domain of x from C to a fundamental polytope, introduced 
by Feldman [3], that is defined on the basis of the parity check 
matrix H. The fundamental polytope is a polytope contained in 
7?." that is a relaxed polytope of the convex hull of C. Thus, 
the set of vertices of the fundamental polytope contains all 
the codewords of C. This relaxation approach thus yields the 
possibility of using a minimization algorithm working on 7?." 
(e.g., a gradient descent algorithm or the Newton method) as 
a decoding algorithm. In the following discussion, we assume 
that the distance function d{x,y) is a convex function with 
respect to the variable x, and that it is a differentiable function. 

The definition of the fundamental polytope is given as 
follow^ 

Definition 3 (Fundamental polytope and its interior set): 
Let Ai — {j G [1, n] : hij = 1}, for i G [1, m] where hij is the 
(i, j)-element of H. The set Ti{i £ [1, m]) is the set of all the 
subsets of odd size in Ai, namely Ti = {S C Ai : \S\ is odd}. 
The constraints for x ~ {xi,X2, ■ ■ ■ , Xn) G 7?.": 



V^e [i,m],V5er„ i + V(xt-i) 



teAi\s 



and 



Vje[l,n], 0<Xj<l 



xt<0, (8) 



(9) 



-For such a channel, the appropriate distance function becomes quadratic 
form. 

^Note that although the definition of the fundamental polytope given here 
may appear at first glance to be somewhat different from that given in [3], 
the two definitions are equivalent. 



are called the parity constraints and the box constraints, 
respectively. The fundamental polytope V is the polytope 
defined by 



V ^ {x e 7^" : X satisfies both constraints ^ and 

(10) 

The interior set of V, denoted by V*, consists of the points 
satisfying 

ViG [l,m],V5er„ ^ xt <0, (11) 

tes t£Ai\s 



and 



Vj G [l,n], <Xj < 1. 



(12) 



These constraints are also called the parity constraints and 
the box constraints, respectively. A point x E TZ" is called a 
feasible po/nQ iff x eV* . □ 
As described before, the set of vertices of the fundamental 
polytope contains all the codewords of C. Such vertices are 
called codeword vertices. Note that, in general, the set of 
vertices also contains non-codeword vectors, which are called 
non-codeword vertices. These non-codeword vertices become 
the main source of sub-optimality in decoding performance 
of a decoding algorithm based on the fundamental polytope 
such as LP decoding. The fundamental polytope T' is a convex 
set defined by a set of m2'^''^^ parity inequalities and n box 
inequalities. 

We are now ready to discuss a relaxation of the MLD rule 
for linear vector channels. The relaxed MLD rule based on the 
fundamental polytope is given by the following definition. 

Definition 4 (Relaxed MLD rule for linear vector channel): 
Let r = Ax + 6 + 2; be a received vector from a linear vector 
channel. A relaxed MLD rule is defined by 



arg min d{{Ax - 
xev 



b),r). 



(13) 



□ 



Note that, as shown in ([13 1, the domain of x has changed from 
C in the non-relaxed rule ( 5]l to T' in the relaxed rule. Since the 
fundamental polytope V is convex and the objective function 
d{{Ax + b), r) is a convex function, the optimization problem 
( [T3] ) can be regarded as a convex optimization problem [4]. 
This observation motivates the use of numerical optimization 
techniques for convex optimization, such as the interior point 
algorithm [4] as a decoding algorithm. For the case of the 
Gaussian linear vector channel, the relaxed MLD rule is given 
by 



X = are min I |r — (Ax 
xev 



(14) 



Of course, the solution of the relaxed MLD problem may 
not be the solution of the original MLD problem because 
the fundamental polytope has vertices which do not belong 
to C. Moreover, the optimal point may not be a vertex of V. 
However, this compromise on the sub-optimality of the relaxed 
MLD rule leads to a large reduction in the computational 
complexity of the decoding. 

"^In a decoding process by interior point decoding, only points in the interior 
set of the fundamental polytope are admissible. This is the reason why we 
say X £ V* is a feasible point. 
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III. Geometrical view of relaxed MLD problem 

FOR ERROR ANALYSIS 

Error analysis of the relaxed MLD, which has a close 
relationship to the geometrical properties of the fundamental 
polytope, is important to clarify the difference between the 
true MLD and the relaxed MLD. In this section, geometrical 
properties of the relaxed MLD problem will be discussed. 

A. Mapped polytope 

It may be helpful for us to obtain a geometric intuition of 
the relaxed MLD rule before discussing further details. Let 
V C TZ" be the fundamental polytope. Applying the affine 
map X 1-^ Ax + b to V, we obtain the image of V: 



Q = {Ax 



beTZ'' -.xeV}. 



(15) 



The set Q is also a polytope, which is called a mapped 
polytope. 

By using Q, we can rewrite the relaxed MLD rule in the 
following two-step process: 



s = arg min r) 
X = A-\s-b). 



(16) 
(17) 



Note that the interference matrix A is assumed to be non- 
singular and thus the inverse A"^ exists. Figure [l] illustrates 
the relation between V^Q^r and x. The dashed circle around 
r denotes the contours of the objective function d{s',r). This 
figure presents a case of mis-correction(i.e., x ^ x). 

Fundamental polytope Mapped polytope 

V 






/ Q 


s = 

...... ( 


Ax + b 
S 



The normal cone defined below is the essential basis of such 
a necessary and sufficient condition. 

Definition 5 (Normal cone): Let T C 7?." be a polytope. 
Suppose that x <^ T . The normal cone at x is defined by 

Njr{x) ^{ye 7^" : y\x' ~x)<0, Vx' e T}. (18) 

The vectors belonging to Njr(x) are called the normal vectors 
of T al X. Q 
From the definition, it is clear that the normal cone is a 
closed convex cone including the origin 0". If x is included 
in the interior set of JF, Nyr[x) = {0"} holds because there 
exists an e-ball (e > 0) centered at x which is totally included 
in T. We next consider the case where a; is on a facet 5* of 
!F and let z be an orthogonal (normal) vector to S. It can be 
verified that tz^{x' — x) < Q holds for any x' e T, where 
t is a non-negative real number. We can show that vectors 
tz are the only vectors that satisfy the inequality. This means 
that Njr{x) = {tz ■.t> 0}. Finally 15 we consider the case 
where a? is a vertex of the polytope T. Suppose that x is 
given as the intersection of facets Si, S2, ■ ■ ■ Sk- The vectors 
Zi, Z2, ■ ■ ■ , Zk are the normal vectors corresponding to these 
facets, respectively. In that case, we can show that 



y e {tizi 



,ife > 0} 



(19) 



A- 



satisfies y^{x' — x) <Q for any x' ^ T and vice versa. This 
leads to the following statement: 

Nr(x) ^ {hzi + ■ --tkZk ■.ti,h,...,tk>0} (20) 

holds if a; is a vertex of T. 

The shifted normal cone Njr{x) + a; is defined by 

N:p{x) + x = {y + x:yeN:F{x)}, (21) 

which is a shifted cone starting from x. Figure |2] illustrates 
the shifted normal cones for a two-dimensional polytope. The 
black circles denote the point x. The figure depicts the three 
cases discussed above (x is on (l)the interior set, (2)a facet, 
(3)a vertex). 



Fig. 1. Geometrical view of the relaxed MLD process. 



B. Norm.al cone 

We here review the linear vector channel model again. The 
received vector r e 7?." is given by r = Ax + 6 + 2;. The 
transmitted vector a; is a (codeword) vertex of a fundamental 
polytope. The additive noise vector z E 7?." is assumed to be 
generated according to the probability density function p{z). 

In order analyze the block error performance of the relaxed 
ML decode:[5 a necessary and sufficient condition for the op- 
timal points of the relaxed MLD problem must be established. 




Fig. 2. Shifted normal cones. 



^We here assume the optimal relaxed ML decoder can solve the relaxed 
MLD problem exactly. 



^We here omit the case where x is on the ridge (or edge) of J^. This case 
is similar to the case where £c is on a vertex. 
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C. Optimality condition and decodable noise region 

The next lemma is the basis of the proof of the Karush- 
Kuhn-Tucker condition for convex optimization problems. 

Lemma 1 (Condition for global optimum): Assume that 
T C 7?." is a convex set. Let f{x) be a convex function to 
be minimized subject to the convex constraint x ^ T . We 
assume that /(x) is differentiable at any a; e JF. If and only 
if 

-V/(a;*) e A/-^(a=*) (22) 

holds, X* is the global minimum of the convex optimization 
problem. 

(Proof) We omit the proof of this Lemma since it can be found 
in standard textbooks on non-linear optimization, such as [14]. 

□ 

This lemma can be used to specify the set of decodable noise 
using a relaxed ML decoder. For this purpose, it is convenient 
to introduce the set of decodable noise patterns. 

Definition 6 (Decodable noise region): Let r G 7?." be a 
received vector. For any s g Q, the decodable noise region 
i?(s) corresponding to s is defined by 



A 



D{s) = {r - s e 7^" : -V(i(s,r) e Nq{s)}. 



(23) 



□ 



The meaning of the decodable noise region becomes clear 
in the following corollary: 

Corollary 1: Assume that a; is a point in the fundamental 
polytope V. The received vector r = Ax + b + z E 7?." can 
be correctly decoded (i.e., x = x holds) using the relaxed ML 
decoder if z E D{Ax + b) holds. 

(Proof) From the assumption z = r — Ax — be D{Ax + 
b), and so together with the definition of the decodable noise 
region, we have 



Vd{Ax + b,r) e Nq{Ax + b). 



(24) 



Since the function ) is convex with respect to the first 
argument, we can use Lemma [T] to verify that the output from 
the relaxed ML decoder is s = Ax + b. Applying the inverse 
affine map, we can obtain the correct estimate: 

x = A-\s-b)=x. (25) 

□ 

The decodable noise region of x completely characterizes 
the block error probability when the vector x E V is sent. We 
here consider the probability of the event that the estimate 
X obtained from the relaxed ML decoder coincides with 
the transmitted codeword vertex x E V. This probability is 
expressed as expressed as 



Pcix) 



A 



Probja; = x} 
Prob{2: e D{Ax 



b)} 



(26) 
(27) 



p(^z)dzidz2 • ■ ■ dzn, (28) 

lD(^AX+b) 

where p{z) is the probability density function of the additive 
noise z. 

The next example considers the case of the Gaussian linear 
vector channel. 



Example 2: Assume that Zi is an independent Gaussian 
random variable with zero mean and variance cr^, that is, 
assume that the noise PDF p{z) is given by 



(27ro-2)"/2 



A 



exp 




(29) 



For this case, d{s,r) = ||r — s|p is used as the distance 
measure for the relaxed ML decoder It is easy to verify that 



— Wd{s, r) cx (r — s) 



(30) 



holds. From this proportional relation, we can obtain the 
equivalence relation 



D{s) = Nq{s). 



(31) 



Suppose that a codeword vertex a; G T' is sent and r = Ax + 
b+z is observed at the receiver side. By using the equivalence 



relation (31 1 and substituting for p(^z) from equation (29 1 into 



equation (28 i, we have the correct decision probability Pcix) 



of the relaxed ML decoder for the Gaussian linear vector 
channel: 



Pcix) = 



1 



exp 



E 

ie\l,n] 



3. 



A 



dz^ 



(32) 



where dz — dz\dz2 ■ ■ ■ dzn- □ 
This example shows that the error performance of the relaxed 
ML decoder is dominated by the shape of the set of normal 
cones NgiAx + b) where a; is a codeword vertex of the 
fundamental polytope V. 

D. Analysis for successive decoding 

It is highly desirable to perform another decoding process, 
which is called secondary decoding, after a relaxed MLD 
process. This is because the relaxed ML decoder may output a 
non-vertex point as the solution of the minimization problem. 
If such a non-vertex point is close enough to the transmitted 
vertex, it can be corrected with a secondary decoding process. 
Secondary decoding can thus improve the overall decoding 
performance. 

The successive decoding process is given as follows. 



Xi = are min diAx + b, r) 

XGV 



X2 



Tix,). 



(33) 
(34) 



The decoding rule ( [33] l is just the relaxed MLD rule, while the 
rule ( [34| corresponds to the secondary decoding. The function 
r : 7?." TZ^ represents the decoding function for secondary 
decoding. For example, F = (Fi, . . . , r„) which is defined 
by 

„ , . J 0, flj < 0.5 
^'("*) = i 1, a,>0.5 



(35) 



is a possible secondary decoding function. This function quan- 
tizes a fractional value in the output vector from the relaxed 
ML decoder. Another example of the secondary decoding is 
the built-in min-sum decoder implemented in the interior point 
decoding presented in the next section. 
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In the following analysis, the decision region of the sec- 
ondary decoding plays a crucial role. The decision region of 
X, where a; is a codeword vertex of V, is given by 



A 



A(a;) = {r e 7^" : x = T{r)}. 



(36) 



We assume that A(a;) and A{x') are disjoint if x ^ x'. In 
the following, we assume a Gaussian linear vector channel for 
simplicity. 

Definition 7 (Union of the shifted normal cones): The 
union of the shifted normal cones associated with the 
successive decoding is defined by 

T{x)^ y [Nq{Ax' + b) + {Ax' + b)]. (37) 

X'&A(X) 

The next corollary shows that the set T{x) can be consid- 
ered to be the decision region corresponding to the transmitted 
vector X. 

Corollary 2: Assume that a; e is a codeword vertex and 
is sent to the channel. The receiver obtains r = Ax + b + z as 
the channel output. The successive decoder outputs the correct 
estimate x = x if r G T{x). 

(Proof) The proof is almost same as the proof of Corollary [T] 
and so is omitted. q 
Figure |3] illustrates the decision region of the successive 
decoding. We can see that T{x) totally includes the shifted 
normal cone of x. This means that secondary decoding can 
improve the correct decision probability Pc{x) in the case of 
Gaussian linear vector channels. 




y T(x) \ 



Fig. 3. Decision region of the successive decoding. 

Due to Corollary |2] error analysis of the successive decoding 
can be divided into several sub-problems: (i) analysis of the 
local structure of the fundamental polytope, (ii) analysis of 
the the decision region A{x) and (iii) numerical evaluation 
of the error probability (via multi-dimensional integration 
or bounds). Of course, these sub-problems are not easy to 
solve for long binary linear codes. However, the geometrical 
perspective established in this section will become a solid 
basis of further error analysis and it helps us to understand 
the behavior of a sub-optimal relaxed ML decoder. 



IV. Overview of Interior Point Decoding 

Although the relaxed MLD problem is easier to solve 
than the original MLD problem, it is still a computationally 
difficult problem and we must consider techniques to solve 
it efficiently. Since the relaxed MLD problem can be seen 
as a convex optimization problem, it is reasonable to apply 
conventional optimization techniques to this problem. In this 
section, the interior point decoding algorithm for linear vector 
channels will be presented. The proposed decoding algorithm 
is based on the idea of the interior point algorithm for convex 
optimization [4]. 

In this section, we first briefly review the idea of the interior 
point algorithm based on a barrier function method, before 
describing the overall structure of the proposed decoding 
algorithm. 

The details of the sub-procedures required for the algorithm 
are explained in the subsequent subsections. 

A. Barrier function method: a brief review 

The interior point algorithm describes a class of opti- 
mization algorithms for solving LP and convex optimization 
problems. Many sophisticated interior point algorithms have 
been developed [4], here we explain the simplest algorithm 
which is based on a barrier function method. 

Let f{x){x € TZ") be a real-valued function to be mini- 
mized. Assume that f{x) is a convex function and the feasible 
region is a convex subset of 7?.". A convex optimization 
problem is a minimization problem with a constraint of the 
form 

minimize f{x), s.t. x E !F. (38) 
The key idea of the barrier function method is to convert 



the original convex optimization problem (38i into an uncon 



strained optimization problem by using a barrier function. Let 
B{x) be a barrier function which has the following properties: 
(i) B{x) takes a finite real value if x E T* where !F* is the 
interior set of the feasible region JF; (ii) B{x) = oo if a; ^ JF*; 
(iii) B{x) is differentiable and convex. Combining the original 
objective function f{x) with the barrier function B{x), we 
have a new convex objective function which is called a merit 
function ip{x) = tf{x) + B{x). The parameter Hs a positive 
real number called a scale parameter. 

Thus the barrier function method replaces the original 
problem p8| ) by the optimization problem 

minimize i!{x), s.t. x E TZ"'. (39) 

It is observed that the constraints in ( |38| ) are absorbed into 
the barrier function and thus problem ( [39| is an unconstrained 
minimization problem for a convex function ip{x). In order 
to solve this problem, we can therefore exploit an efficient 
numerical optimization algorithm such as the gradient descent 
method or the Newton method. 



Of course, the unconstrained problem ( 39 1 and the original 
problem (38 1 are not Identical, and the optimal point xl of 
problem ( 38 i will not, in general, coincide with the optimal 
point x'2 of (39 1. However, when t 
that Xn 



00, we can expect 



Xl because the effect from the barrier function 
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becomes relatively small in this limit. On the other hand, 
smaller values of t improve the rate of convergence of the 
solution. As t becomes larger, the barrier function approaches 
a discontinuous function, which in general tends to retard the 
rate of convergence. 

A well-known recipe to obtain a point x'2 close to 
is to perform the gradient descent or the Newton method 
several times while t is gradually increased. The interior point 
algorithm based on the barrier function method consists of the 
two loops called inner and outer loops, respectively (see Fig|4] 
left). In the inner loop, i/'(a;) is minimized using the gradient 
descent or the Newton method. The aim of the outer loop is 
that of a scaling of t. In each iteration of the outer loop, t 
is multiplied by a positive constant a, and so the value of t 
increases as the number of outer iterations increases. 

The name "interior point" comes from the fact that search 
points (the tentative candidates for the optimal point) located 
on a trajectory moving towards the optimal point are always 
contained in T* (see Fig|4]right). For faster convergence, it is 
hoped that the trajectory of the search points does not approach 
the boundary of the feasible region. The barrier function is 
introduced to prevent a search point from approaching the 
boundary. 



Outer loop 
Inner loop 




min 'I'Hx) = mill itfi'x.) + i?(x)) 


1 




1 ■ 


t := at 








Fig. 4. Interior point algoritlim based on tlie barrier function method. 

B. Objective and merit functions for interior point decoding 
We here apply the interior point algorithm based on the 



barrier function method to the relaxed MLD problem (13 1. 
In the following, we will assume a Gaussian linear vector 
channel for simplicity, but the extension to the non-Gaussian 
(or correlated Gaussian) case is straightforward. The objective 
function to be minimized is, therefore, ||r — {Ax + 6)|p. 

Definition 8 (Objective function): The objective function 
f{x){x e 7?.") for the relaxed MLD problem is defined by 

f{x) = \\r — {Ax 

2 



E 




a.jXj + \ , (40) 

ie[l,n] \ Ve[i,n] 

where r.i and bi are ith elements of r and b, respectively. 
The symbol denotes the («, j)-element of the interference 



matrix A. 



□ 



The function f{x) is a differentiable, convex function, with 
these properties making it suitable for use in the the gradient 



descent and Newton methods. In the case of non-Gaussian 
linear vector channels, we can also define an appropriate 
objective function. 

Various choices exist for the form of the barrier function. 
In this paper, we adopt b{x) ~ — ln(— a;) as the basis of the 
barrier function, which is called a log-barrier function. It is 
clear that b{x) — 00 holds when x = 0; on the other hand, b{x) 
takes a finite value if a; < 0. Furthermore, b{x) is a convex 
function. Thus the function b{x) satisfies the requirements 
of the barrier functions as discussed above. Moreover, the 
derivative of b{x), namely. 



dx 



b{x) 



1 

X 



(41) 



is simple enough for implementation in a decoding algorithm. 
The log-barrier function including the parity constraints and 
box constraints is given below. 

Definition 9 (Log-barrier function): The log-barrier func- 
tion for the fundamental polytope B{x){x E 7?.") is defined 
by 



Bi-)=- E El- 

j6[l,m] SCTi 



- i + E(^*-i)- E ^* 

\ tes t£Ai\s 

ln[-(-a:,)]- ^ ln[-(x,-l)]. (42) 
je[i,n] ie[i.n] 



□ 

The log-barrier function B{x) inherits the properties of b{x); 
B{x) is a convex and differentiable function. Furthermore, if 
X ^ V* , then B{x) — 00 holds; otherwise B{x) < 00. 

The definition of the merit function including both the 
objective function and the log-barrier function is as follows. 

Definition 10 (Merit function): For any x G 7?.", the merit 
function t/i^*^ (a;) for the relaxed ML decoding problem is given 
by 

(43) 



A 



i,<'*\x)^tf{x) + B{x), 



where t is a positive real number. q 
The first term of ip'-^'>{x) is a scaled version of the objective 
function. The second term is the log-barrier function cor- 
responding to the fundamental polytope. Since the sum of 
convex functions is also a convex function, ■0'-*'' {x) is a convex 
function. Note that 'tp^'^^{x) takes a finite value if x £V*. 

C. Partial Response Channels 

In the field of magnetic recording, the PR channel model 
is often exploited as a basis of system design. Interferences 
arising from neighboring symbols are linearly superimposed 
on the current symbol, with these interferences increasing the 
complexity of the decoding process. 

Let S + 1-real numbers {ho, hi, . . . , hs} be the partial 
response (PR) coefficients. The parameter 5 is called the degree 
of the PR channel. The PR channel can be regarded as a 
discrete time finite impulse response (FIR) filtered channel 
with additive white Gaussian noises (see Fig|5]l. The definition 
of PR channel is given below. 
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Definition 11 (Partial response channel): A binary vector 
X = {xi,X2, ■ ■ ■ TXn) G C is transmitted to the channel 
defined by 

r]^^hd{l- 2xj^d) + Zj , j e[l, n] , (44) 
d=a 

where Zj{j E [l,n]) is an independent Gaussian random 
variable with mean and variance a^. By convention, we 
assume that 

1 

2' 



a;_(5_i) = .T_(5_2) = ■ ■ ■ = x-i ^ xq ^ (45) 



which simplifies the treatment of the boundary condition. The 
channel is called a PR channel. The signal to noise ratio ( snr 
) of the PR channel is defined by 



\d=0 / 



(46) 

□ 



Binary to bipolar 
conversion 



he 



hi 



^2" 



h^ 



t 

■© 



■e 



Fig. 5. Block diagram of partial response channels. 

As shown in Fig|5] the jth received symbol rj consists of 
the weighted sum of {1 — 2xj, 1 — 2xj_i, . . . , 1 — 2xj-s} and 
the noise term. It is clear that the PR channel can be expressed 
in the following linear vector channel form: 



r — Ax + b + z, 
where A and b are given by 



(47) 



/ho ••• 

hi ho 



A = -2 



\ 



■ ■ ho 



,b = 



ho / 



f ho \ 

hi + ho 



(48) 

The PR channels can be regarded as linear vector channels 
with a sparse interference matrix A. Thus, the PR channel is an 
ideal candidate for the interior point decoding. Throughout the 



paper, this channel will be used as an example of a Gaussian 
linear vector channel. 

For PR channels, the objective function J{x) takes the 
following form: 



f{x)^Y.^[r,- [Y,hd{l-2x,^d) 



(49) 



\d=0 



The partial derivative of tf{x) with respect to a variable 
Xp{p e [1,?^]) is required to compute the approximate gra- 
dient presented later From the objective function (49 1, we 
immediately have 

-tf{x)^tY,Tr-[r,-J2hdil^2x,^d)) 



j = l P \ d=0 



j=k \ d=0 I 

It can be observed that the number of additions required to 
evaluate equation (50i is proportional to (5^. This means that 
evaluation of the gradient Wtf{x) takes 0{n) computational 
time if S is constant. 

We next consider the Hessian of the objective function, 
which is required for the Newton method to be discussed later 
From (fSOll, we have 



_d_ 
dx. 



P+S / s \ 

-tf{x) = uY,hj^p[r,-Y.^d{l-2x,^d)\ 

' j=p \ d=0 J 



P+S S 

8i E E ^i-p^dXj-d + K 

j=p d=0 



(51) 



forp e [1, The symbol K denotes terms that do not contain 

Xk- Let q = j — d. The second derivative of the objective 
function is given by 



d 



P+S 



-tf{x) = 8t^/i,_j,/i,Vb'-9e [0,<5]].(52) 



By setting j — p ^ a, we obtain the following expression: 



d 



dXp X q 



tf{x) - 8tJ2haha+p-qI[a + p-qe [0,(5]](53) 



a=0 



for p e [1, n], q e [1, n]. 

D. Overall structure of interior point decoding 

The goal of the interior point decoding is to solve the re- 
laxed MLD problem by using an interior point algorithm based 
on the barrier function method. The interior point decoding 
consists of two nested loops: an outer loop and an inner loop. 
The inner loop corresponds to either an approximate gradient 
descent method or an approximate Newton method, both of 
which try to minimize the merit function i/''^*)(a;). The outer 
loop contains the following sub-procedures: inner loop, built- 
in min-sum decoding, and scaling of the parameter t. The 
details of these sub-procedures are discussed in the following 
subsections. 
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The procedure Interior Point{-) is the main procedure for 
the interior point decoding. The positive integer parameters 
Omax and I„iax Specify the number of iterations executed in 
the outer and inner loops, respectively. The parameter to is the 
initial scale parameter, which is a positive real number 



Procedure: b := InteriorPoint{r) 
Input: r G TZ" : received word 
Output: b £ F2 lj{*}: estimation word 
Step 1 Let X := (1/2, 1/2, ... , 1/2) and t := to. 
Step 2 Repeat the following sub-steps (2.1-2.4) Omax 
times: 

Step 2.1 Repeat the following process Imax times: 

X := InnerLoop{x,t). 

Step 2.2 Execute (p,b) :— MinSum(x). 
Step 2.3 If p = 0, then exit. 
Step 2.4 Let t := at. 
Step 3 Let 6 := * (* denotes decoding failure) and then 
exit. 



The heart of the decoding algorithm is the process called 
Inner Loop{-). This procedure updates a search point x using 
the gradient descent method or the Newton method so as to 
minimize the merit function i/)'^*)(a;). Since the search point 
X must always be contained in V* , we need to check the 
feasibility of the search point in this process. 

After the execution of the inner loop, built-in min-sum 
decoding is performed to obtain an estimate of the transmitted 
word. The role of the min-sum decoding is to find a codeword 
near to the current search point obtained from the inner loop 
process. 

In the interior point method, a search point must lie in the 
feasible region V* in all iterations of the procedure. The fol- 
lowing lemma justifies the choice of a; = (1/2, 1/2, . . . , 1/2) 
as the initial search point. 

Lemma 2: If 7? is a row-regular parity check matrix with 
Wr > 3, then the initial search point x = (1/2, 1/2, . . . , 1/2) 
is a feasible point, i.e., x eV*. 

(Proof) It is evident that x satisfies the box constraints. Thus, 
we only need to consider the parity constraints. For any i E 

[l,m],S £ Ti, we have 



-EG-' 

tes ^ 



1, 



1 - -\Ai 
2 2' 



1 - < 
2 



(54) 

using the assumption > 3 and row-regularity. This means 
that X also satisfies the parity constraints. q 



secondary decoding discussed in the previous sectiorj^ That 
is, it can compensate a non-vertex point obtained from an 
inner-loop process. Therefore, this built-in decoding process 
is indispensable for the interior point decoding technique. 

In the following, a brief description on the built-in min- 
sum decoding is given. Let Bj{i E [1,"-]) be the set of row 

indices such that Bj ^ {i E [l,m] : hij — 1}. The following 
procedure MinSum{x) is the standard log-domain min-sum 
algorithm with a dumping factor (also known as normalized 
min-sum algorithm [13]). 



Procedure: (p, b) := MinSum{x) 
Input: X eV*: current search point 

Output: (p, b): parity flag p and tentative estimate vector b 
Step 1 Compute the log likelihood ratios: 



A, := In ' 



(55) 



for j E [i,n]. Set :— for all pair 

satisfying j E Ai. 
Step 2 Repeat the following sub-steps (2.1-2.4) Lmax 
times. 

Step 2.1 For all pairs satisfying j E Ai, 

evaluate 



Vj^i := \j + ^ ^k^j. 

keBj \i 



(56) 



Step 2.2 For all pairs satisfying j E Ai, 

evaluate 



X min \r]k^i\. 

keAi\j 

The function sign(-) is defined by 



(57) 



sign(x) = | ^'^ (58) 

Step 2.3 For j E [l,n], decide the tentative 
estimate word b = {bi,b2, . . . ,b„) in the 
following way: 

(59) 

Step 2.4 If Hb = holds, then let p := and 
exit. 

Step 3 Let p := 1. 



, _ I - ^k&Bj ^''^j - ^ 



E. Built-in min-sum decoding 

The advantage of exploiting the built-in min-sum decoding 
process after the inner loop process is the consequent reduction 
in the required number of outer and inner iterations. If the 
current search point approaches close enough to a codeword 
vertex, the built-in decoding process can output the coiTe- 
sponding codeword as an estimate vector. In particular, we 
need not wait for the search point to converge to a codeword 
vertex, which in general requires a much longer computational 
time. Furthermore, the built-in min-sum decoding acts as the 



In the initialization part, the LLRs used in the min-sum 



decoding are computed as expression (55 1. Since < a;^ < 1 



holds for any j E [l,ri], we here regard the probability 

such that the jth transmitted symbol is 1 . The constant k which 
appears in expression (57 1 is the dumping factor {k ~ 0.7- 
0.9) which improves decoding performance of the min-sum 
decoding. 



'We can also use another decoding algorithm such as sum-product algo- 
rithm, bit-flipping algorithm instead of min-sum algoiithm. 
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V. Inner loop based on gradient descent method 

The gradient descent method is a well-known minimiza- 
tion method for an unconstrained convex function with a 
known first derivative. The convergence of the gradient descent 
method is relatively slow compared with methods that utilize 
second derivatives (i.e., the Hessian) of the objective function, 
such as the Newton method, but the gradient descent method 
is much easier to implement. 

A. Brief review on gradient descent method 

We here briefly review the gradient descent method. Let 
h{x),x e 7?." be a real- valued convex function to be min- 
imized. In the process of the gradient descent method, a 
search point gradually approaches the optimal point. For each 
iteration of the minimization process, the search point x is 
updated as 

X -.^ x~ sVh{x), (60) 

where Vh{x) is the gradient of h{x). That is, the search 
point moves in the descent direction —Vh{x). A positive real 
parameter s is called the step size parameter. 

The optimal choice of s, for a given x, is obtained by 
solving the following one-dimensional optimization problem: 

s — &xgmiiih{x — s'Vh{x)). (61) 

s'>Q 

This one-dimensional optimization process is usually called 
the exact line search. Other than the exact line search ( [6T| , 
some inexact line search methods such as the bisection scheme 
with backtracking [4] also exist. For the exact line search 
and some of the inexact line searches, it can be proved that 
the search point eventually converges to the optimal point [4] 
using the gradient descent update ( [60| . 

B. Approximate gradient descent method 

The procedure InnerLoop{-) returns a new search point 
which is computed from the current search point. In order 
to make the interior point decoding computationally tractable, 
some approximations are introduced. 

The aim of the procedure Inner Loop{-) is to find the 
minimum of 

using a gradient descent method. It 
uses the approximate gradient (instead of the true gradient 
VV'^^'H^)) ™d inexact line search scheme inside. Thus, 
strictly speaking, the procedure is an approximation of the 
gradient descent method. 

The following procedure is the main part of the inner loop, 
which can be regarded as an inexact line search method based 
on the bisection scheme. 

Procedure: x := Inner Loop{x,t) 
Input: X (^V*: current search point 
Output: x: updated search point 
Step 1 Set s ■- 1. 

Step 2 g := ApproxGradient(x,t). 
Step 3 Let x := x — sg 

Step 4 If IsFeasible{x) = 0, namely x ^ V* holds, then 

let s := s/2 and return to Step 3. 
Step 5 Let x := x. 



Since the vector x is an interior point of the fundamental 
polytope, there exits s > satisfying x sg E V* . This means 
that the loop composed of Step 3 and Step 4 must eventually 
stop. 

The process Inner Loop{-) includes the feasibility check in 
its procedure. Thus, it is guaranteed that the updated point 
is located in the feasible region V* . Figure |6] presents an 
example of a search step of Inner Loop{-). A search process 
is performed to find a feasible point on the half line starting 
from the current search point in the opposite direction to that of 
the approximate gradient vector For the case shown in Fig|6] 
the first temporary search point x is not in V* . Thus, the 
step size parameter s is reduced to s := s/2 to produce the 
second temporary search point, which is a feasible point. As 
a result, the second point becomes the next search point (i.e., 
the updated search point). 



1st trial :NG 

Q 




Fundamental polytope 



Fig. 6. A search step of Inner Loop{-). 

The procedure uses the approximate gradient g and does not 
include an evaluation of the merit function ip^^\x). Moreover, 
only a feasibility check is carried out in the procedure. 
Thus, we cannot expect the objective function to be a non- 
increasing function of the number of iterations in this case. 
This implies that the search point may not converge to the 
optimal point. However, due to this compromise, we obtain 
a huge reduction in computational costs. For example, the 
evaluation of the objective function, which requires a time 
computational complexity 0{2^''n), can be avoided in this 
procedure. In this paper, we assume that m scales proportional 
to n, namely m = (1 — < R < 1. In contrast, 

the execution of the feasibility check and evaluation of the 
approximate gradient take only 0{wrn) computation time. 
Note that the evaluation of the true gradient of the barrier 
function VB{x) requires 0(2'"''n) computation time. 

C. Feasibility check 

The purpose of the procedure IsFeasible{-) is to decide the 
feasibility of a given x e TZ". In the interior point method, 
a search point must lie in the feasible region in all iterations. 
Thus, the feasibility check is one of the key procedures of the 
inner loop. 
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The procedure IsFeasihle{x) returns 1 if x G V*\ oth- 
erwise it returns 0. Of course, we could evaluate all the 



inequalities (111 and (12 1 to decide the feasibility of given 
X. However, such a brute-force approach takes 0(2"'''n) 
computation time to check the feasibility because the number 
of inequalities (11'' owr-i^ 



m. However, we can do much 
better as shown in the next lemma and the algorithm. 

Lemma 3 (Feasiblity check): The point {xi,X2, ■ ■ ■ ^Xn) G 
7?." belongs to V* if and only if 6*^ < for any i G [1, m] and 
< < 1 for any j G [l,n] where 9i{i G [I,™]) is defined 
by 



A 



max 



les 



(62) 



(Proof) If di < for any i G [l,m] and < < 1 for any 
j G [l,n], then 



(63) 



les 



Procedure: / :— IsFeasihle(x) 

Input: X = {x\,X2, ■ . ■ ,Xn) G TZ": current search point 
Output: /: If X G P*,then / ~ 1; otherwise / := 0. 
Step 1 If there exists j G such that Xj do not satisfy 



Step 2 Repeat sub-steps (2.1-2.3) from i 
Step 2.1 For I £ Ai, set 



1 to m. 



(i) 



Xl 



1 < 
1 > 



'Xl 
-Xl 



(65) 



and evaluate 



1 + ^ ^ max{a;i 



1,-2;;}, (66) 



(0 



Kndv-Y.i(,A,Pi ■ 
Step 2.2 If V is even, then update u as u := u - 

min,gAj2a;!-l| andlet/?,'^'^^ := pf^^ ( 

1, where Imin ■= argminjgAi \2xi - l\. 

Step 2.3 If w > 0, then set / := and exit. 

Step 3 Set / := 1 and exit. 



The symbol denotes addition over F2. Let us consider 
the states of the variables after an execution of the above 
procedure. Let S' — {I Ai : I3\^^ — 1}. Suppose the case 
where v is odd. From this assumption, it is evident that 



is odd. The right-hand side of ( 66 1 can be rewritten as 



holds for any i G [l,m] and S 
maximum value of 1 + 'Ylies^^i 
implies that a: is a feasible point. 



C Tj, because Oi is the 

- 1) - Y.ieAAs'^i- ™s 



We then consider the opposite direction. Assume that x ^ 
V* ■ This means that some of the parity constraints or the box 
constraints are violated. If some of the box constraints are 
not satisfied, then there exists j* G [1,7^] such that Xj- is not 
in the open interval between and 1. If some of the parity 
constraints are not satisfied, there exists an index i* G 
and S* C T,. such that 



(64) 



From the definition of Oi, it is evident that Oi- > in such a 
case. This completes the proof. □ 

Lemma |3] is useful to reduce the computational time re- 
quired to carry out the feasibility check, because the com- 
putational time required to evaluate all 9i{i G [Lti]) is 
proportional to Wrm. In other words, 6i can be computed 
without generating all the subsets of odd size in T^. We can 
use an idea similar to maximum UkeUhood decoding for even 
weight code^ The following procedure includes such an idea 
for an efficient computation of 9i. 



1 + maxja;; — 1, -~xi\ 
leAi 

leS' li£A,\S' 



1 + max 

SCT, 



E(^' - 1) - E 

les ieA,\s 



(67) 



Thus, in this case, u = 6i holds. We then consider the case 
where v is even. In such a case, one of the components in 
{p'f^'^iizAi should be flipped so as to make the weight of 



the binary vector {p[''^}ieAi odd. Let V denote the index at 
which the bit flip occurs, namely (3^ := (3^ ® I. The bit 
flipping decreases the value of u to u — \xi — 1 — {—xi)\ — 
u — \2xi — 1|. Since the aim of the bit flipping is to find the 
optimal subset with odd size, it is reasonable to determine 
the index V according to the values of \2xi — 1|. Namely, 
tmin '■= argmin;g^. \2xi — 1| gives the smallest decrement 



(i.e., the largest value of u). Therefore, u 
this case. 



Oi also holds for 



D. Approximate gradient 

The gradient of ?/)'^*^(a;), which is defined on V* , is given 

by 



The Viterbi algorithm can also be used for evaluating 9i 



A 



^'^'\x) 



(68) 



12 



We have, after some manipulation, the partial derivative of 
■tp^^^x) with respect to the variable Xfe(/c g [1,?^]): 



d 
dxk 



oxk 



dxi 



B{x) 



E E 

ie[l,m] SCTi 



(^) 



1 

Xk 



1 



where 



(^) 



A 



Xk-l 

I[k G Ai\S] - I[k e S] 



(69) 



(70) 



for i G S C [l,n]. The notation I [condition] is the 

indicator function such that I[condition] = 1 if condition 
is true; otherwise, it gives 0. Note that the derivative of the 
objective function (HOb is given by 



-t ^ 2aik 
ie[i.i] 




The next example presents the gradient of the merit function 
of PR channels. 

Example 3: For the case of PR channels, we have the 
following derivative of the merit function: 



d 
dxk 



k+S 

uY,h,-k 



E E- 

i£[l,m\ SCT, 



J2hd{l-2x,^d)\] 

\d=0 ) ) 



{x) 



1 

Xk 



Xk~\ 



□ 

We can see that the partial derivative corresponding to the 
barrier function of the parity constraints contains 2"'''~^m 
terms. The computational time of computing V7/'^*''(a;) is 
therefore an exponential function of the row weight Wr- The 
computational cost of the gradient becomes the major obstacle 
to achieving a fast implementation of the interior point decod- 
ing when a given parity check matrix has a relatively large 
row weight. It is for this reason that we use an approximate 
gradient instead of the true gradient VV'*'*-'(^)- 

Definition 12 (Approximate gradient): The approximate 
gradient, denoted by g = (gi, . . . , (7„), is defined by 



A 



9k = t — f{x) 
oxk 



{x) 



Xk Xk-l 

for k G [1, n] and x G V* . The subset S*'*-* is defined by 



(72) 



' — are max 

SCT. 



1 + 



les 



ieAi\s 



From the definition of S*^' 
expressed as 5^*' — {I £ Ai 



xi , i £ [l,m]. 

(73) 

□ 

it is clear that S'*^*^ can be 



/3f) - 1} where {Pl'>}i^ 



is the vector obtained after an execution of IsFeasible{x). 



This means that evaluation of J^iel 



l,m] '''k 



{x) requires a 



computational time proportional to Wrm, which is much faster 
than the computational time required for the evaluation of 
XliG[i m] SscTi '''k '^\^)^ SO following the execution of 
IsFeasible{x), we can immediately evaluate the approximate 
gradient g using P^^K 

The approximate gradient is defined on a; G V* 
implies 



This 



1+ y(xi-i) 



E 



a;; < 



holds for any i G [1, to]. From this inequality and the definition 
of S^^\ it is easy to show that 



(x) 



> 



(74) 



holds for any k G [l,n],i G [1,to] and S C Ti{S ^ 
5''^'^). This inequality indicates that the approximate gradient 
includes the largest (in terms of absolute values) contribution 
for each i G [1 , to] . 

The following procedure ApproxGradient{-, •) efficiently 
evaluates the approximate gradient g. 



Procedure: g :— ApproxGradient{x,t) 
Input: X dV*: current search point 
Output: g: approximate gradient 
Step 1 Set 

, 9 ft ^ 1 



1 



dxk' ' ' Xk Xk 
for k G [1, n]. 

Step 2 Execute IsFeasible{x) to obtain S^^\i G [l,7n] 
Step 3 Repeat sub-step 3.1 from i := 1 to m. 
Step 3.1 Let 

{»,s(')) 



(75) 



9k 



>{x) (76) 



for k £ Ai 



The time complexity of each step is as follows. If the 
interference matrix A contains only 0{n) non-zero elements 
(i.e., A is a sparse matrix), the initialization process (Step 
1) takes 0(n)-time. As discussed before. Step 2 requires 
0{wrn)-\m\e. Finally, Step 3 needs 0{wrn)-\.\mt. In total, the 
evaluation of the approximate gradient takes 0(wrn)-time. 

VI. Inner loop based on the Newton method 

The Newton method is a numerical minimization algorithm 
with faster convergence than that of the gradient descent 
method. The convergence speed of the Newton method is 
known to be quadratic around the optimal point. It is thus 
appropriate to consider the Newton method for use as an 
optimization engine in the interior point decoding in order to 
achieve better decoding performance. However, we should be 
careful about the time complexity required for the execution of 
the Newton method, in which we need to handle the Hessian 
of the merit function. In general, evaluation of the Hessian of 
the merit function takes 0(n^)-time, and solving the Newton 
equation needs 0(ri'^)-time. The Newton equation is a linear 
equation Gd = — g, where G is the Hessian (nxn real matrix) 
at the current point and g is the gradient vector The vector d 
is called the Newton step. Thus, it is important to fully utilize 
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special structures of the merit function tp'-*\x); for instance, 
sparseness of the Hessian. 

A. Inner loop based on the Newton method 

The inner loop using the Newton method is shown below. 
Most processes are identical with the inner loop based on the 
gradient descent method. The differences are in Steps the 
evaluation of the approximate Hessian (Step 2), derivation of 
the Newton step (Step 3), and update of the temporary search 
point (Step 4). The details of the new processes introduced 
here are presented in the subsequent subsections. 



Procedure: x := Inner Loop(x,t) 
Input: X £V*: current search point 
Output: x: updated search point 
Step 1 Set s ■- 1. 
Step 2 Evaluate 

g :— ApproxGradient{x,t), 
G :— ApproxHessian(x,t). 

Step 3 Solve the Newton equation Gd = — g. 
Step 4 Let x ~ x — sd 

Step 5 If IsFeasihle{x) = 0, that is if x ^ V* holds, then 

let s := s/2 and return to Step 3. 
Step 6 Let x := x. 



X]ie[i m] X^scTi 'he left-hand side of equation (|77|l is 
replaced with a single summation X]te[i m] using 5(*rThis 
approximation has already been used to derive the approximate 
gradient. Due to these approximations, computation of G 
requires only 0(wr?^)-time if the interference matrix is sparse, 
i.e., the number of non-zero elements in A scales as 0{n). 
The process ApproxHessian{- , •) is the routine to evaluate 



the approximate Hessian G according to Definition 13 Thus, 
the details are omitted. 

The following example treats the case of the PR channel. 

Example 4: For PR channel case, the approximate Hessian 
G = {Gpq} has the following form: 



G 



pq 



S 

a=Q 



+p-q 



I[a + p-qe[0,S]] 



(x) 



I[p = q] 



[l.in] 

1 



1)^ 



(79) 



Figure |7] illustrates the configuration of the approximate Hes- 
sian. In this case, the matrix G becomes a symmetric Toeplitz 
matrix. Moreover, it has diagonal band structure as shown in 
the figure. q 



B. Approximate Hessian 

The second derivative of the barrier function B{x) is given 

by 



d 



B{x) 



ie[l,m] SCT, 



1 



1 



(77) 



(Xp 1)^, 

for p e [1, n], g e [1, n]. As in the case of the gradient descent 
method, a straightforward evaluation of {dB{x))/{dxpXq) 
needs 0(2'"''n)-time. This is one of the reasons why we will 
introduce the approximate Hessian given below. 

Definition 13 (Approximate Hessian of tl^^^^x)): The ap- 
proximate Hessian G = {Gpq}{p E [i,n], q G [!,"■]) is 
defined by 



Gti 



A 



d X J) X q 



[l,rn] 



lip - q] 



{xp^iy 



(78) 



for p g [l,n], q £ [l,n]. The subsets S^^\i £ [Ij'Ti]) are 



defined by (73 1. q 
We can see that the approximate Hessian includes a con- 
tribution from the (true) Hessian of the objective function 
f{x) and contribution from the approximate Hessian of the 
barrier function B{x). The approximate Hessian of B{x) has 
only diagonal elements; non-diagonal elements are discarded. 
Another approximation used here is that the double summation 



Contribution from the 
objective fuction _ 



Contribution from the 
barrier fuction — 




Fig. 7. Configuration of the approximate Hessian for PR channels. 



C. Solving the Newton equation 

The most time consuming part of the Newton method is the 
determination of the Newton step, which is required to solve 
the Newton equation Gd = —g. We here discuss how to solve 
the Newton equation efficiently in an inner-loop process. 

I) Cholesky decomposition: If the approximate Hessian is 
a positive definite symmetric matrix, Cholesky decomposition 
is applicable to solve the Newton equation. For a given 
n X n positive definite matrix M, Cholosky decomposition 
decomposes M as M = LL* where i is a lower triangular 
matrix. A linear equation Mx = b can be solved in an efficient 
way by using a combination of Cholesky decomposition and 
backward/forward substitutions [8]. 

In the case of PR channels, the approximate Hessian be- 
comes a symmetric positive definite matrix, and so we can 
employ Cholesky decomposition to solve the Newton equation. 
Fortunately, Cholesky decomposition can be accomplished 
with time complexity 0{n) for a matrix having the form shown 
in Fig|7] Since the backward/forward substitutions require only 
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0(ri)-time, the Newton equation can be solved with time 
complexity 0{n). This approach may be suitable for software 
implementation of the interior point decoding since Cholesky 
decomposition is a serial-type algorithm. 

2) Jacobi method: The Jacobi method is an iterative 
method for solving a linear equation Mx — b [8]. This method 
is especially suitable for the case where the coefficient matrix 
M is sparse. The details of the Jacobi method are as follows. 
The linear equation Mx = b can be rewritten in the following 
form: 



{L + U + D)x = b, 



(80) 



where L, U and D are lower triangular, upper triangular 



and diagonal matrices, respectively. Equation (80i can be 
transformed to 



X = D-\b~ {L+U)x). 



(81) 



We can regard equation ( [8T| l as an update rule of x and thus 
obtain the following recursive formula: 



x(k) =D-\b-{L + U)x^''-^^). 



(82) 



Starting from an appropriate initial value x^'^\ we can evaluate 
the above recursive formula iteratively. If certain conditions 
are met, a;'''") eventually converges to the solution of the linear 
equation Mx = b. 

Sometimes the Jacobi method fails to converge when the 
matrix M is not diagonally dominant. Under relaxation (UR^ 
of the Jacobi method yields convergent results for a wider 
class of linear equations, including those that cannot be treated 
with by the original Jacobi method. The update rule for the 
UR Jacobi method is given by 



X 



y 



(fc) = 

(fe) _ 



(fe) 



+ (1 - cj)a;('=-i) (83) 
D-^{b-{L + U)x^^-^^). (84) 



where a; is a real constant in the range < a; < 1. 

The Jacobi method with under relaxation is appropriate 
for application to solving the Newton equation. This method 
requires 0(n) -time if the coefficient matrix is sparse (the 
number of non-zero elements in the matrix is 0{n)) and the 
number of iterations is fixed. The Jacobi method is an algo- 
rithm of parallel type, and so should be suitable for a hardware 
implementation that can utilize its massive parallelism. 



D. Decoder architecture 

Figure |8] shows a possible hardware architecture of the 
interior point decoder using the Newton method. There are 
five major processing blocks given by approximate gradient 
computation, approximate Hessian computation, Jacobi solver, 
feasibility check, and built-in min-sum decoder. Every block 
is suitable for parallel implementation. In order to design a 
high-speed decoder, this parallelism must be exploited. 

'UR is known as Successive Over Relaxation (SOR) wlien 1 < u! < 2. 
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Fig. 8. A possible decoder architecture of tlie proposed interior point decoder. 



VII. Behavior of interior point decoding 

In this section, the behavior of the interior point decoding is 
discussed on the basis of the results of computer simulations. 
In the following simulations, a regular LDPC code with 
parameters n — 204, m — 102, Wc — 3,Wr — 6 is assumed 
where Wc and denote column and row weight, respectively. 
The code is due to MacKay [7]. The channels used in the 
simulation are PR channels. 



A. Objective function values as a function of number of 
iterations 

Figure |9] presents the average values of the objective func- 
tion f{x) as a function of the number of iterations. The two 
curves in Fig|9] correspond to the results of the proposed 
scheme obtained using the gradient descent-inner loop and the 
Newton-inner loop, respectively. The number of iterations is 
defined as the number of executions of the inner loop in the 
decoding process. These curves have been obtained by taking 
the average of 1000 trials (1000 codewords). 

The parameters of the decoder are as follows: Imax = 
5,Omax — 10, io — 5.0, Q — 2.0. The channel is the dicode 
channel (i.e., ha = 1.0, /ii = -1.0) with an SNR of 8dB. It 
may be observed that the average objective function values 
decreases rapidly in the first few iterations. The curve of the 
gradient descent method shows the fastest decrement when 
the number of iterations is small (such as 1-3) . However, 
the Newton method (with Cholesky decompositiorp*} gives 
smaller objective function values following the 4th iteration. 

'"since no evident difference in decoding performance between tlie Newton 
metliod witli Cholesky decomposition and that with Jacobi method has been 
observed, we assume the Newton method with Cholesky decomposition 
thi'oughout the section. 
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Fig. 9. Values of the objective function: gradient descent and Newton 
metliods. 



These results suggest that the proposed decoding algorithm us- 
ing the Newton method may require fewer iterations compared 
with that employing the gradient descent method. 

We next consider the balance of the number of inner/outer 
loops. In the following experiment, the product of the number 
of inner and outer loops is set to be 50. Three combi- 
nations {I,nax-,Or,iax) = ( 1 , 50) , (5, 10) , (10, 5) havc been 
tested, where the interior point decoding using the Newton 
method with Cholesky decomposition has been used. Figure 
10 presents the objective function curves for the three combi- 
nations, where the parameters of the decoder are included in 
the figure. We can see that the pair {Imax,Omax) = (5, 10) 
gives the smallest values after the 7th iteration. 
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Fig. 10. Balance of the number of inner and outer loops. 



Figure 11 plots the dependency on the scale parameter a. 
Again, the interior point decoding using the Newton method 
with Cholesky decomposition is adopted. It is observed that 



the convergence properties of this method are not so sensitive 
to the choice of the scale factor a. 
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Fig. 11. Dependency of convergence on scale factor a. 



B. Bit error probability of the interior point decoding 

In this subsection, the bit error probability curves of the 
interior point decoding obtained from computer simulations 
are presented. The code used in the simulations is a regular 
LDPC code with parameters n = 204, m = 102, Wc = 3, = 
6. 

In order to make a comparison with conventional decoding 
algorithms, we also obtained results using the joint message 
passing decoding (abbreviated as joint MPD) in this paper 
The block diagram of the joint MPD is presented in Fig 12 
The joint MPD consists of two parts: BCJR (Bahl, Cocke, 
Jelinek and Raviv) algorithm [9] for a PR channel and min- 
sum algorithm (with dump factor 0.7) for an LDPC code. 
The BCJR algorithm computes extrinsic information in the 
standard way and passes this information to the min-sum algo- 
rithm. The min-sum algorithm uses the output from the BCJR 
algorithm as a priori information. The extrinsic information 
generated from the min-sum algorithm is then treated as a 
priori information in the BCJR algorithm. The two parts of 
this method are iteratively executed in turn. We assume that 
the maximum number of iterations in the min-sum decoder is 
10 and that the overall iterations are limited to 20. For each 
iteration, a parity check procedure is executed; the iteration 
stops when the parity check passes. 

Figure [13] shows the BER (Bit Error Rate) curves for 
the 1 — D channel {ho = hi = 1). In this figure, the 
decoding performances of the interior point decoding using the 
gradient descent and the Newton methods are compared. For 
example, the label "gradient (5,5)" corresponds to the gradient 



descent method with {Imax^Omax) = (5,5). The parameters 
to = 5.0 and a = 2.0 are used in these simulations. The 
maximum number of iterations in the built-in min-sum decoder 



is L,, 



20 and the dump factor is 



0.7. Comparing 



16 



# of overall iterations = 20 



Extrinsic 
information 





BCJR algorithm for 
PR channel 








2''-states 






Min-sum algorithm for 
an LDPC code 






•« 



Extrinsic 
information 



dump factor = 0.7 
# of max. min-sum loop = 10 

Fig. 12. Block diagram of joint message passing decoding (joint MPD). 



the gradient descent (5, 5) and the Newton (5, 5) resuhs, 
Newton (5,5) exhibits much smaller bit error probabilities 
(approximately 2dB gain at BER = 10^'^) than those of 
gradient (5,5). This difference could be a consequence of the 
faster convergence of the Newton method. 



From Fig 13 it can be observed that a large improvement 
in BER can be obtained by increasing I„iax from 5 to 20 in 
the case of the gradient descent method. On the other hand, 
only negligible improvement is achieved by increasing Imax 
from 5 to 20 in the case of the Newton method. From these 
observations and other simulation results, we may be able to 
conclude that the interior point decoding using the Newton 
method requires at least 5 inner-iterations to achieve most 
of the potential performance. By contrast, the interior point 
decoding using the gradient descent method requires at least 
20 iterations (however, more than 20 iterations offers only a 
marginal improvement). 



10-^ 



gradient (5,5) 
gradient (20,5) 
Newton (5,5) 
Newton (20,5) 



10"' 



o- 10'" r 



10 



10- 




6 7 
SNR (dB) 

Fig. 13. BER curves of interior point decoding for I — D channel (1). 



Figure 14 presents BER curves of gradient (20,5), Newton 
(5,5), and joint MPD. The channel is also a 1 — D channel. 



as in the case of Fig 13 From Fig 14 firstly, we can see that 
interior point decoding yields a better decoding performance 
than that of joint MPD. The difference is approximately 



1.5dB (Newton v.s. joint MPD) at bit error probabiUty 10""'. 
Secondly, it is observed that Newton method gives smaller bit 
error probabilities than those obtained by using the gradient 
descent method. 
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Fig. 14. BER curves of interior point decoding for I — D channel (2). 



Figure [15] shows BER curves for a long tail PR channel 
{S — 16). Note that joint MPD cannot be applied to such a 
PR channel with large number of states (i.e., PR channel with 
long tail PR coefficients) because of the huge computational 



costs of BCJR computation, and so Fig 15 does not include 
the BER curve for joint MPD. The PR coefficients of this long 
tail PR channel are 



{^0, h2, ■ 
{1.000, 
-1.438, 
0.197, 



,^16} = 

0.253, 
-0.910, 
-0.743, 



-0.293, 
0.106, 
0.490, 



0.084, 
-0.600, 
-0.070, 



-0.057, 0.992, 
-0.844, 0.018, 
1.43}. 



The coefficients {hi, . . . hig} are samples of a Gaussian ran- 
dom variable with mean and variance 1 . We can see that the 
interior point decoding certainly has the capability to decode 
a given received vector observed from such a long tail PR 
channel. This can be considered as an advantage of the interior 
point decoding. Comparing the results of the gradient (20,5) 
and Newton (5,5) methods, the latter gains approximately 1.5 
dB gain at a bit error probability of 10^^. 



Figure 16 deals with the case of an EPR4 channel {5 = 
3,/io = hi ~ l,h2 = = —1). In this case, among 
three decoding algorithms (gradient descent (20,5), Newton 
(5,5), and joint MPD), joint MPD yields the best decoding 
performance. A large gap (1.8dB at BER = 10^**) exists 
between the BER curves of joint MPD and Newton (5,5). 
This performance degradation may be explained from the 
geometrical view point. Some vertices of the mapped polytope 
associated with this channel would have a very thin decision 
region. 



Figure 17 presents the case where the PR coefficients are 
{ho = hi = h-z = 1, = —1). This channel and the EPR4 
channel have the same degree, 5 = 3, with the channels 
differing only in the sign of the coefficient /12. From Fig 17 
we can see that interior point decoding offers smaller bit error 
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Fig. 16. BER curves of interior point decoding for an EPR4 channel {hg = 
hi = 1, /i2 = /13 = -1). 



probabilities than those of joint MPD across the entire range 
of SNR. These results suggest that the decoding performance 
of the interior point decoding is highly dependent on the PR 
coefficients of the channel. 

Table |l]presents the throughput of the software implemented 
decoders. Here the throughput of the decoder is defined as the 
number of codewords tested in a second. Namely, it includes 
the time for encoding, noise generation, and decoding. It is 
fair to say that the values of throughput highly depend on 
the implementation and thus it should be considered as rough 
estimates of the decoding complexity. From Table |l] we can 
observe that the proposed algorithm gives a higher throughput 
than that of joint MPD. It is also interesting to see that Newton 
(5,5) achieves a much higher throughput than that of gradient 
descent (20,5). This improvement on throughput is mainly 
due to the faster convergence of the Newton method, which 
can compensate the additional complexity (solving the Newton 
equation) required by this method. Note that a throughput of 
1715 blocks/sec corresponds to 583 /i second per codeword. 



TABLE 1 

Throughput of Software Implemented Decoders. 

decoder throughput (blocks/sec) 

gradient descent (20,5) 706 
Newton with Cholesky (5,5) 1715 
Newton with Jacobi (5,5) 1506 
joint MPD 359 

Channel: 1 - D channel, SNR = 6 dB 
Code: n = 204, m = 102, Wc = 3,Wr = 6 
Computer environment: Mac Pro with Intel Xeon 2.0 GHz 



VIII. Conclusion 

In this paper, the interior point decoding for linear vector 
channels is presented. The proposed algorithm is based on the 
principle of a relaxed MLD rule which is a convex optimiza- 
tion problem. Approximate variations of the gradient descent 
and the Newton methods play a key role in the interior point 
algorithm to solve the convex optimization problem. Several 
efficient implementation techniques have been developed in 
order to realize a decoder with reasonable computational 
complexity. The proposed algorithm may be applicable to 
various kinds of channels with memory, such as channels 
with additive correlated noise, MIMO channels, and 2D-ISI 
channels. 

Error analysis based on the geometrical properties of the 
fundamental polytope is given as well. The decision regions 
of a relaxed ML decoder can be characterized by the normal 
cones of the affine image of the fundamental polytope. This 
geometrical view helps us to understand the behavior of a sub- 
optimal relaxed ML decoder based on convex optimization. 

As a matter of fact, we cannot simply conclude that the 
optimization approach is superior to the message passing 
approach. There are some cases where joint MPD overcomes 
the proposed algorithm (e.g, EPR4 case). Furthermore, there 
exist other configurations of a joint MPD (see [12]) which 
have not been tested in this paper. 

However, the simulation results presented in the previous 
section are encouraging and they show the potential of the 
optimization approach. Compared with a conventional joint 



MPD, the proposed decoding algorithm achieves better BER 
performance with less decoding complexity in the case of 
PR channels in many cases. An advantage of the proposed 
algorithm is that it is capable of handling the channels with 
long memory. 

The present paper discusses a scheme based on a relatively 
simple interior point algorithm (i.e., barrier function method). 
More sophisticated convex optimization algorithms with faster 
convergence property [4] [15] can be considered in future. For 
example, recently, Vontobel [6] presented a new interior point 
algorithm for linear programming decoding. The algorithm 
is based on the primal-dual interior point algorithm which 
appears promising in terms of the convergence speed. 

The efficient implementation techniques (fast feasibility 
check, approximate gradient, approximate Hessian) developed 
in this paper and the formulation of the barrier function 
representing the fundamental polytope could be useful not only 
in the proposed algorithm but also in forthcoming decoding 
algorithms based on various types of the interior point algo- 
rithm. 
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